AITopics | robotic manipulation task

Collaborating Authors

robotic manipulation task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploration via Hindsight Goal Generation

Neural Information Processing SystemsDec-25-2025, 10:12:23 GMT

Goal-oriented reinforcement learning has recently been a practical framework for robotic manipulation tasks, in which an agent is required to reach a certain goal defined by a function on the state space. However, the sparsity of such reward definition makes traditional reinforcement learning algorithms very inefficient. Hindsight Experience Replay (HER), a recent advance, has greatly improved sample efficiency and practical applicability for such problems. It exploits previous replays by constructing imaginary goals in a simple heuristic way, acting like an implicit curriculum to alleviate the challenge of sparse reward signal. In this paper, we introduce Hindsight Goal Generation (HGG), a novel algorithmic framework that generates valuable hindsight goals which are easy for an agent to achieve in the short term and are also potential for guiding the agent to reach the actual goal in the long term. We have extensively evaluated our goal generation algorithm on a number of robotic manipulation tasks and demonstrated substantially improvement over the original HER in terms of sample efficiency.

exploration, hindsight goal generation, name change, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Harnessing Bounded-Support Evolution Strategies for Policy Refinement

Hirschowitz, Ethan, Ramos, Fabio

arXiv.org Artificial IntelligenceNov-17-2025

Improving competent robot policies with on-policy RL is often hampered by noisy, low-signal gradients. We revisit Evolution Strategies (ES) as a policy-gradient proxy and localize exploration with bounded, antithetic triangular perturbations, suitable for policy refinement. We propose Triangular-Distribution ES (TD-ES) which pairs bounded triangular noise with a centered-rank finite-difference estimator to deliver stable, parallelizable, gradient-free updates. In a two-stage pipeline -- PPO pretraining followed by TD-ES refinement -- this preserves early sample efficiency while enabling robust late-stage gains. Across a suite of robotic manipulation tasks, TD-ES raises success rates by 26.5% relative to PPO and greatly reduces variance, offering a simple, compute-light path to reliable refinement.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.09923

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning

Shen, Weijie, Liu, Yitian, Wu, Yuhao, Liang, Zhixuan, Gu, Sijia, Wang, Dehui, Nian, Tian, Xu, Lei, Qin, Yusen, Pang, Jiangmiao, Guan, Xinping, Yang, Xiaokang, Mu, Yao

arXiv.org Artificial IntelligenceOct-17-2025

Vision-Language-Action (VLA) models are experiencing rapid development and demonstrating promising capabilities in robotic manipulation tasks. However, scaling up VLA models presents several critical challenges: (1) Training new VLA models from scratch demands substantial computational resources and extensive datasets. Given the current scarcity of robot data, it becomes particularly valuable to fully leverage well-pretrained VLA model weights during the scaling process. (2) Real-time control requires carefully balancing model capacity with computational efficiency. To address these challenges, We propose AdaMoE, a Mixture-of-Experts (MoE) architecture that inherits pretrained weights from dense VLA models, and scales up the action expert by substituting the feedforward layers into sparsely activated MoE layers. AdaMoE employs a decoupling technique that decouples expert selection from expert weighting through an independent scale adapter working alongside the traditional router. This enables experts to be selected based on task relevance while contributing with independently controlled weights, allowing collaborative expert utilization rather than winner-takes-all dynamics. Our approach demonstrates that expertise need not monopolize. Instead, through collaborative expert utilization, we can achieve superior performance while maintaining computational efficiency. AdaMoE consistently outperforms the baseline model across key benchmarks, delivering performance gains of 1.8% on LIBERO and 9.3% on RoboTwin. Most importantly, a substantial 21.5% improvement in real-world experiments validates its practical effectiveness for robotic manipulation tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.143

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation

Yuan, Yifu, Cui, Haiqin, Chen, Yibin, Dong, Zibin, Ni, Fei, Kou, Longxin, Liu, Jinyi, Li, Pengyi, Zheng, Yan, Hao, Jianye

arXiv.org Artificial IntelligenceMay-28-2025

Achieving generalization in robotic manipulation remains a critical challenge, particularly for unseen scenarios and novel tasks. Current Vision-Language-Action (VLA) models, while building on top of general Vision-Language Models (VLMs), still fall short of achieving robust zero-shot performance due to the scarcity and heterogeneity prevalent in embodied datasets. To address these limitations, we propose FSD (From Seeing to Doing), a novel vision-language model that generates intermediate representations through spatial relationship reasoning, providing fine-grained guidance for robotic manipulation. Our approach combines a hierarchical data pipeline for training with a self-consistency mechanism that aligns spatial coordinates with visual signals. Through extensive experiments, we comprehensively validated FSD's capabilities in both "seeing" and "doing," achieving outstanding performance across 8 benchmarks for general spatial reasoning and embodied reference abilities, as well as on our proposed more challenging benchmark VABench. We also verified zero-shot capabilities in robot manipulation, demonstrating significant performance improvements over baseline methods in both SimplerEnv and real robot settings. Experimental results show that FSD achieves 40.6% success rate in SimplerEnv and 72% success rate across 8 real-world tasks, outperforming the strongest baseline by 30%.

arxiv preprint arxiv, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.08548

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.49)

Add feedback

FLAME: A Federated Learning Benchmark for Robotic Manipulation

Betran, Santiago Bou, Longhini, Alberta, Vasco, Miguel, Zhang, Yuchong, Kragic, Danica

arXiv.org Artificial IntelligenceMar-3-2025

Recent progress in robotic manipulation has been fueled by large-scale datasets collected across diverse environments. Training robotic manipulation policies on these datasets is traditionally performed in a centralized manner, raising concerns regarding scalability, adaptability, and data privacy. While federated learning enables decentralized, privacy-preserving training, its application to robotic manipulation remains largely unexplored. We introduce FLAME (Federated Learning Across Manipulation Environments), the first benchmark designed for federated learning in robotic manipulation. FLAME consists of: (i) a set of large-scale datasets of over 160,000 expert demonstrations of multiple manipulation tasks, collected across a wide range of simulated environments; (ii) a training and evaluation framework for robotic policy learning in a federated setting. We evaluate standard federated learning algorithms in FLAME, showing their potential for distributed policy learning and highlighting key challenges. Our benchmark establishes a foundation for scalable, adaptive, and privacy-aware robotic learning.

federated learning, learning, robotic manipulation, (13 more...)

arXiv.org Artificial Intelligence

2503.01729

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Alberta (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > China > Shandong Province > Qingdao (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Survey on Vision-Language-Action Models

Adilkhanov, Adilzhan, Yelenov, Amir, Seitzhanov, Assylkhan, Mazhitov, Ayan, Abdikarimov, Azamat, Sandykbayeva, Danissa, Kenzhebek, Daryn, Mukashev, Dinmukhammed, Umurbekov, Ilyas, Chumakov, Jabrail, Spanova, Kamila, Burunchina, Karina, Yergibay, Madina, Issa, Margulan, Zabirova, Moldir, Zhuzbay, Nurdaulet, Kabdyshev, Nurlan, Zhaniyar, Nurlan, Yermagambet, Rasul, Chibar, Rustam, Seitzhan, Saltanat, Khajikhanov, Soibkhon, Taunyazov, Tasbolat, Galimzhanov, Temirlan, Kaiyrbay, Temirlan, Mussin, Tleukhan, Syrymova, Togzhan, Kostyukova, Valeriya, Massalim, Yerkebulan, Kassym, Yermakhan, Nurbayeva, Zerde, Kappassov, Zhanat

arXiv.org Artificial IntelligenceFeb-15-2025

This paper presents an AI-generated review of Vision-Language-Action (VLA) models, summarizing key methodologies, findings, and future directions. The content is produced using large language models (LLMs) and is intended only for demonstration purposes. This work does not represent original research, but highlights how AI can help automate literature reviews. As AI-generated content becomes more prevalent, ensuring accuracy, reliability, and proper synthesis remains a challenge. Future research will focus on developing a structured framework for AI-assisted literature reviews, exploring techniques to enhance citation accuracy, source credibility, and contextual understanding. By examining the potential and limitations of LLM in academic writing, this study aims to contribute to the broader discussion of integrating AI into research workflows. This work serves as a preliminary step toward establishing systematic approaches for leveraging AI in literature review generation, making academic knowledge synthesis more efficient and scalable.

large language model, machine learning, vlm, (20 more...)

arXiv.org Artificial Intelligence

2502.06851

Country: Asia (0.46)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Goal State Generation for Robotic Manipulation Based on Linguistically Guided Hybrid Gaussian Diffusion

Xu, Yichen, Chang, Faliang, Liu, Chunsheng, Wang, Dexin

arXiv.org Artificial IntelligenceDec-25-2024

In robotic manipulation tasks, achieving a designated target state for the manipulated object is often essential to facilitate motion planning for robotic arms. Specifically, in tasks such as hanging a mug, the mug must be positioned within a feasible region around the hook. Previous approaches have enabled the generation of multiple feasible target states for mugs; however, these target states are typically generated randomly, lacking control over the specific generation locations. This limitation makes such methods less effective in scenarios where constraints exist, such as hooks already occupied by other mugs or when specific operational objectives must be met. Moreover, due to the frequent physical interactions between the mug and the rack in real-world hanging scenarios, imprecisely generated target states from end-to-end models often result in overlapping point clouds. This overlap adversely impacts subsequent motion planning for the robotic arm. To address these challenges, we propose a Linguistically Guided Hybrid Gaussian Diffusion (LHGD) network for generating manipulation target states, combined with a gravity coverage coefficient-based method for target state refinement. To evaluate our approach under a language-specified distribution setting, we collected multiple feasible target states for 10 types of mugs across 5 different racks with 10 distinct hooks. Additionally, we prepared five unseen mug designs for validation purposes. Experimental results demonstrate that our method achieves the highest success rates across single-mode, multi-mode, and language-specified distribution manipulation tasks. Furthermore, it significantly reduces point cloud overlap, directly producing collision-free target states and eliminating the need for additional obstacle avoidance operations by the robotic arm.

artificial intelligence, machine learning, target state, (18 more...)

arXiv.org Artificial Intelligence

2412.18877

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Reward Machine Inference for Robotic Manipulation

Baert, Mattijs, Leroux, Sam, Simoens, Pieter

arXiv.org Artificial IntelligenceDec-13-2024

Learning from Demonstrations (LfD) and Reinforcement Learning (RL) have enabled robot agents to accomplish complex tasks. Reward Machines (RMs) enhance RL's capability to train policies over extended time horizons by structuring high-level task information. In this work, we introduce a novel LfD approach for learning RMs directly from visual demonstrations of robotic manipulation tasks. Unlike previous methods, our approach requires no predefined propositions or prior knowledge of the underlying sparse reward signals. Instead, it jointly learns the RM structure and identifies key high-level events that drive transitions between RM states. We validate our method on vision-based manipulation tasks, showing that the inferred RM accurately captures task structure and enables an RL agent to effectively learn an optimal policy.

demonstration, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2412.10096

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > Oregon (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)

Add feedback

GSL-PCD: Improving Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning

Yuan, Xiu

arXiv.org Artificial IntelligenceNov-11-2024

Generalization in Deep Reinforcement Learning across unseen environment variations often requires training over a diverse set of scenarios. However, random task partitioning in GSL can impede specialist performance, as it often assigns vastly different variations to the same specialist, typically resulting in each specialist being assigned just one variation, which increases computational costs. To improve this, we propose Generalist-Specialist Learning with Point Cloud Featurebased Task Partitioning (GSL-PCD). This approach clusters environment variations based on features extracted from object point clouds, using balanced clustering with a greedy algorithm to assign similar variations to the same specialist. Evaluations on robotic manipulation tasks from the ManiSkill benchmark demonstrate that point cloud feature-based partitioning outperforms vanilla partitioning by 9.4% with a fixed number of specialists and reduces computational and sample requirements by 50% to achieve comparable performance.

point cloud, specialist, variation, (9 more...)

arXiv.org Artificial Intelligence

2411.06733

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Task-oriented Robotic Manipulation with Vision Language Models

Guran, Nurhan Bulus, Ren, Hanchi, Deng, Jingjing, Xie, Xianghua

arXiv.org Artificial IntelligenceOct-21-2024

Vision-Language Models (VLMs) play a crucial role in robotic manipulation by enabling robots to understand and interpret the visual properties of objects and their surroundings, allowing them to perform manipulation based on this multimodal understanding. However, understanding object attributes and spatial relationships is a non-trivial task but is critical in robotic manipulation tasks. In this work, we present a new dataset focused on spatial relationships and attribute assignment and a novel method to utilize VLMs to perform object manipulation with task-oriented, high-level input. In this dataset, the spatial relationships between objects are manually described as captions. Additionally, each object is labeled with multiple attributes, such as fragility, mass, material, and transparency, derived from a fine-tuned vision language model. The embedded object information from captions are automatically extracted and transformed into a data structure (in this case, tree, for demonstration purposes) that captures the spatial relationships among the objects within each image. The tree structures, along with the object attributes, are then fed into a language model to transform into a new tree structure that determines how these objects should be organized in order to accomplish a specific (high-level) task. We demonstrate that our method not only improves the comprehension of spatial relationships among objects in the visual environment but also enables robots to interact with these objects more effectively. As a result, this approach significantly enhances spatial reasoning in robotic manipulation tasks. To our knowledge, this is the first method of its kind in the literature, offering a novel solution that allows robots to more efficiently organize and utilize objects in their surroundings.

large language model, natural language, spatial relationship, (18 more...)

arXiv.org Artificial Intelligence

2410.15863

Country: Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Add feedback